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The Miscellaneous Crenarchaeota group (MCG) Archaea is one of the predominant archaeal groups 
in anoxic environments and may have significant roles in the global biogeochemical cycles. 
However, no isolate of MCG has been cultivated or characterized to date. In this study, we 
investigated the genetic organization, ecophysiological properties and evolutionary relationships of 
MCG archaea with other archaeal members using metagenome information and the result of gene 
expression experiments. A comparison of the gene organizations and similarities around the 16S 
rRNA genes from all available MCG fosmid and cosmid clones revealed no significant synteny 
among genomic fragments, demonstrating that there are large genetic variations within members of 
the MCG. Phylogenetic analyses of large-subunit + small-subunit rRNA, concatenated ribosomal 
protein genes and topoisomerases IB gene (TopolB) all demonstrate that MCG constituted a sister 
lineage to the newly proposed archaeal phylum Aigarchaeota and Thaumarchaeota. Genes involved 
in protocatechuate degradation and chemotaxis were found in a MCG fosmid 75G8 genome 
fragment, suggesting that this MCG member may have a role in the degradation of aromatic 
compounds. Moreover, the expression of a putative 4-carboxymuconolactone decarboxylase was 
observed when the sediment was supplemented with protocatechuate, further supporting the 
hypothesis that this MCG member degrades aromatic compounds. 
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Introduction 

The study of the ecophysiology of archaea is currently 
one of the most exciting research areas in the field of 
environmental microbiology. Many uncultivated 
archaeal groups have been discovered as microbial 
diversity surveys have expanded and improved, but 
the physiological properties of most of these unculti- 
vated archaea remain to be determined. For instance, 
uncultivated archaeal lineages such as Marine 
benthic group B (also known as deep-sea archaeal 
group). Miscellaneous Crenarchaeota group (MCG) or 
South African gold mine Euryarchaeota group were 
found widespread in marine sediments (Teske and 
S0rensen, 2008); however, the functions of these 
archaea in the environments are still unknown. 

MCG archaea live in diverse habitats, including 
terrestrial and marine, hot and cold, surface and 
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subsurface environments (Biddle et ah, 2006; Teske, 
2006; Kubo et al, 2012). The label 'miscellaneous' 
appears to represent the difficulty in categorizing 
the wide terrestrial and marine habitat range of this 
group (Inagaki et al., 2003). S0rensen and Teske 
(2006) divided hundreds of MCG clones into smaller 
and more manageable subgroups — MCG-1 to MCG-4. 
Jiang and Li, (2011) performed a comprehensive 
phylogenetic analysis and divided MCG archaea 
into seven subgroups (MCG- A to MCG-G), whereas 
Kubo et al. (2012) divided MCG archaea into 17 
subgroups. In addition to its cosmopolitan distribu- 
tion, the MCG group of archaea is one of the most 
abundant groups in the subsurface sedimentary 
biosphere based on the 16S rRNA gene abundance: 
the MCG clones account for 33% of all clones from 
47 16S rRNA gene libraries obtained from 11 
published studies of the deep marine biosphere 
(Fry et al, 2008). Moreover, the MCG was found to 
be one of the most active groups in the deep marine 
biosphere (Fry et al., 2008; Li et al., 2012b). Parallel 
16S rRNA and rDNA analyses of Ocean Drilling 
Program site 1229 on the Peru Margin indicated 
that the MCG dominated the archaeal clone 
libraries based on PCR-amplified 16S rDNA genes 
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(Parkes et ah, 2005) and on reverse-transcribed 16S 
rRNA (Biddle et al, 2006). At the Ocean Drilling 
Program site 1227 on the Peru Margin, MCG archaea 
were abundant in 16S rRNA gene clone libraries 
from all depths (Inagaki et ah, 2006) and they 
dominated the reverse-transcribed 16S rRNA pool in 
all sediment layers except the deep-sea archaeal 
group/Marine benthic group B horizon (S0rensen 
and Teske, 2006). In addition, the carbon isotope 
signatures of archaeal cells and polar lipids from 
MCG-dominated sediment horizons indicate that 
these anaerobes utilize buried organic carbon sub- 
strates (Biddle et ah, 2006). The widespread dis- 
tribution, high abundance and metabolic activities 
of MCG archaea all indicate that these organisms 
might be significant players in biogeochemical 
cycles. However, the paucity of representative pure 
cultures has hindered our understanding of the 
physiological properties of these archaea as well as 
their ecological functions and evolutionary position. 
Environmental genomics provides an approach to 
explore the potential physiological characteristics 
and genomic information of uncultivated microbes 
in the context of indigenous microbial communities. 
Just recently, single-cell genome analysis suggested 
that members of the MCG archaea are specializing in 
extracellular protein degradation (Lloyd et ah, 
2013). Till now, only a few MCG fosmid and cosmid 
clones have been identified. One MCG fosmid clone 
was reported containing a functional bacteriochloro- 
phyll a synthase [hchG] gene, a key enzyme for 
bacteriochlorophyll a biosynthesis. However, the 
in vivo physiological functions of BchG in MCG are 
still unknown, although it was supposed that 
containing a presumptive Bchl a synthase gene, 
may give the archaea more flexibility to survive or 
adapt to various environments (Meng et al., 2009). 
The other three analyzed fosmid clones contain 
homologous to potentially important functional 
genes involving in lipid biosynthesis, energy meta- 
bolism and resistance to oxidants (Li et al., 2012a). 
But the physiological properties and the roles of 
these organisms in natural biogeochemical cycles 
are still remaining to be determined. 

In this study, we investigated the phylogenetic 
position and potential ecophysiological properties of 
this little understood MCG archaeal group using an 
environmental metagenomic method. A member of 
the MCG was hypothesized to be aromatic compound 
degrader based on genome information. This hypoth- 
esis was further supported by target gene expression 
analysis after substrate supplementation. 



Materials and methods 

Site description and sampling 

Estuarine sediment was collected from a site of 
around 0.5 m water depth on the Qi'ao Island 
(Pearl River Estuary, 22°27'21.4'' N, 113°38'7.3''E) 
in Guangdong Province, China, in 2005 April using 



a single-core sampler. The temperature of the bottom 
water was 21.5 °C and the salinity at the surface 
of sediment was 2.6%. Mangrove sediment was 
collected from a national mangrove reserve in 
Jiulongjiang estuary (24°24'48.6''N, 117°56'30.5''E), 
Fujian Province, China, 2009. All samples were kept 
on dry ice during transport and then stored in a 
- 20 °C fridge. 

Construction of the genomic library 
High-molecular-weight genomic DNA was extracted 
according to the method of Zhou et al. (1996) and 
separated using pulsed-field agarose gel electro- 
phoresis after both DNA ends were end-repaired 
following the manufacturer's instructions (Epicentre, 
Madison, WI, USA). After the electrophoresis was 
completed, an agarose plug containing 33-48 kb 
DNA was cut out, and the DNA was recovered using 
electro-elution (Bio-Rad, Hercules, CA, USA). The 
genomic DNA purified from this plug was ligated to 
pCClFOS fosmid or pWEB-TNC cosmid, followed 
by packaging into MaxPlax Lambda Packaging 
Extract (Epicentre). The packaged particles were 
transferred into Escherichia coli EPI300 or EPIlOO 
(Epicentre). In total, ^8000 clones for the estuarine 
sediment and ^9000 clones for the mangrove 
sediment were obtained in this study. The average 
insert size was 35 kb. 



Screening for the archaeal genome fragments 
The library was pooled into groups of 12 clones, and 
the mixed fosmid or cosmid plasmids were 
extracted using a standard alkaline lyses procedure. 
These extracted plasmids were used as templates for 
PGR amplification. Multiplex PGR with archeal 16S 
rRNA universal primer set Arch2lF/958R (DeLong, 
1992) was used to screen for clones containing 
archaeal 16S rRNA gene. Plasmids of 12 individual 
fosmid/cosmid clones, with positive archaeal 16S 
rRNA gene amplification, were then extracted and 
used as templates for the second round of PGR 
amplification. The single fosmid/cosmid clones 
containing archaral 16S rRNA gene were under 
subsequent investigations. 

Analysis of the metagenome sequences 75G8 and 26B6: 
tRNA genes, Open Reading Frame search and protein 
identification 

Shotgun libraries were sequenced by the Sanger 
sequencing method to determine the complete insert 
sequences of each clone as described before (Meng 
et al., 2009). Open Reading Frame was predicted 
with GeneMark (Lukashin and Borodovsky, 1998). 
BLAST were used to search for similar sequences in 
GenBank (Altschul et al, 1997) with an E-value 
cutoff of <10"^. In addition, protein annotation 
against Pfam (Sonnhammer et al., 1997) and InterPro 
(Zdobnov and Apweiler, 2001) was performed with 
an e-value <10"^. Signal peptides were scanned 
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with Signal? 4.0 (Petersen et al, 2011) and trans- 
membrane segments were predicted using TMHMM 
2.0 (http://www.cbs.dtu.dk/services/TMHMM/). The 
conserved domains of predicted protein sequences 
were detected using the program InterProScan 
(Zdobnov and Apweiler, 2001). tRNAs were 
scanned using tRNAscan-SE v.1.21 tool (Lowe 
and Eddy, 1997). 



Phylogenetic analysis of 16S rRNA, LSU-SSU rRNA 
and predicted proteins 

For 16S rRNA phytogeny, representatives of MCG 
and Marine benthic group B, Marine group I were 
selected from ARB-silva (http://www.arb-silva.de/) 
as reference sequences. The LSU-SSU (large-subunit- 
small-subunit) operon (23S rRNA-16S rRNA) from 
Crenarchaeota, Euryarchaeota, Thaumarchaeota from 
GenBank were selected for LSU-SSU phylogenetic 
tree. MAFFT with L-INS-i strategy was used for all 
alignments in this paper (Katoh et al., 2002). 
Maximum likelihood phylogenetic trees of aligned 
genes were inferred with RAxML, using the general 
time-reversible model of substitution and the 
GAMMA model of rate heterogeneity; tree topologies 
were checked by 100 bootstrapping replicates. 

In the case of ribosomal proteins, each ribosomal 
protein was aligned by MAFFT first and then all 
alignments were concatenated. For the phylogenetic 
tree of protein, the best protein model was deter- 
mined with ProteinModelSelection.pl (http://sco. 
h-its.org/exelixis/software.html). LG were selected 
as best protein model for ribosomal protein and 
Topomerase IB (ToPoIB) protein (Le and Gascuel, 
2008). Maximum likelihood phylogenetic trees were 
constructed using RAxML estimated by the LG 
model of protein substitution and the GAMMA 
model of rate heterogeneity, with 100 replications 
for bootstrapping. 



Sediment collection and substrate feeding experiment 
Mangrove sediment from the same location as that 
for cosmid library was utilized for substrate feeding 
experiment, on 2009 October. First, we emptied the 
core rod of a sterile 50 ml syringe, and then 
vertically inserted it into sediment with the tip up. 
After the syringe was filled with sediment, the core 
rod was pushed back to expel remaining air in the 
syringe. The top tip of each syringe was immediately 
sealed with parafilm and all filled syringes were 
transported on ice to the laboratory. One syringe was 
stored at — 80 °C as original sample. Others were 
processed with incubation experiment. Protocatech- 
uate solution was prepared as following: 0.5 g 
protocatechuate was dissolved in 2 ml of deionized 
water and filtered with 0.22 mm filter. The proto- 
catechuate solution was then injected into the 
syringe from the top tip and seeped into the 
sediment. The tip of the syringe was then sealed 
with parafilm, and the whole syringe was covered 



with foil and incubated at a thermostatic room 
(26 °C) for 45 days. One syringe without injection of 
protocatechuate was used as a control and was 
covered with foil and incubated under the same 
condition. 



RNA extraction and gene expression 
Original sample, control sample and each layer (L1-L4) 
from the protocatechuate-supplemented samples 
were used for RNA extraction, respectively. Two 
grams of each sediment sample were used for RNA 
extraction using E.Z.N. A. Soil RNA Kit (Omega, 
Bio-Tek, Norcross, GA, USA). Total RNA was treated 
with DNase I at 37 °C for Ih to remove potential 
DNA contamination. Reverse transcription-PCR was 
performed on the purified total RNA using RevertAid 
H Minus Reverse Transcriptase (Fermentas, Hanover, 
MD, USA) with specific primers (CDSl6-forward: 
5'-CCTCGGCGAGCATTTCCGGG-3^ CDSl6-reverse: 
5'-GCCCATCGGCAGGAAGGTGG-3' ; CDSl 7-forward: 
5'-CATCACCTGCTTGATGCTCT-3^ CDSl7-reverse: 
5'-CGGGAAATTCGTGGAATATG-30, following the 
manufactures' instructions. Two microliters of 
reaction mixture from reverse transcription was 
used for following PGR amplification. The PGR 
cycle condition was as following: 98 °G for 30 s, 30 
cylces of 95 °G for 30 s, 55 °G for 30 s, 72 °G for 30 s 
and extension at 72 °G for 7 min. After amplification, 
PGR products were subject to electrophoresis, and 
PGR bands from agarose gels were purified by 
E.Z.N.A Gel Extraction kit (Omega, Bio-Tek). The 
purified PGR amplicons were ligated with pMD-18T 
vector and transformed to E. coli DH- 5a. Three 
positive clones for each PGR amplicon were sent out 
for sequencing. 

Nucleotide sequence accession number 
The 16S rRNA gene and the genomic sequences in 
this study were all deposited in the DDBJ/EMBL/ 
GenBank nucleotide sequence databases with 
KF439060 and KF439061. 

Results and discussion 

Metagenomic library construction and screening 
A cosmid library was constructed from mangrove 
sediment from Zhangjiang Mangrove Reservation, 
Fujian Province, Ghina. The mangrove sediment 
used in this study contained abundant MGG archaea 
estimated by 16S rRNA gene library analyses (Zhang 
et al, 2009; Li et al, 2012b). The cosmid library 
contained ^9000 clones and the average insert 
length was 35 kb. 

This cosmid library derived from the mangrove 
sediment and a fosmid library constructed from 
estuarine sediment (Meng et al., 2009) were 
screened for MGG clones by PGR amplification. Six 
clones containing archaeal 16S rRNA genes were 
obtained, four of them belonged to MGG group 



The ISME Journal 



Genetic and functional properties of uncultivated MCG archaea 

J Meng et al 



(as shown in Figure 1). Three clones 37F10, 75G8 
and 26B6 yielded full insert sequences. On the basis 
of the classification and phylogenetic analyses of 



MCG 16S rRNA gene sequences by Jiang et al. 
(2011), 37F10 was grouped into MCG-A, whereas 
75G8 and 26B6 were placed within the MCG-G 



single cell * 
fosmld/cosmid 





Mangrove sediment Uncultured archaeon clone TopArch6 EF680198 
Mangrove soil Uncultured archaeon clone MKCST-A4 DQ363799 
landfill leachate Uncultured clone GZK8 AJ576209 
Mangrove sediment Uncultured archaeon clone BotArch75 EF680226 
Mangrove sediment Uncultured archaeon clone MidArch44 EF680200 

Rice paddy soil uncultured clone:NRP-S AB243810 
Estuary sediment Fosmid clone 37F10 
Lake Vilar; unclutured clone F4 AJ937877 
Lake Griffy; uncultured clone pGrfC26 U59986 
Horsetooth Reservoir; uncultured clone HTA-B10 AF418925 
subseafloor sediments from the sea of Okhotsk; clone:OHKA13.26 AB094558 
saline soil; uncultured clone ss016b AJ969773 i 
■ Fosmid done E48-1C HQ21 461 2 ' 
' subseafloor sediments from the sea of Okhotsk; clone: OHKA5.23 AB094546 
sulfidic springs in the marsh; isolate Str6_A4 AM055701 
Napoli mud volcano clone Napoli-2A-36 AY592499 

subsurface marine sediment ODP Leg 201 clone 1H5_F01 DQ301988 

subsurface marine sediment ODP Leg 201; clone ODP1227A19.01 AB177015 
deep subsurface groundwater from sedimentary rock milieu; clone: HDBW-WA27 AB237760 
subseafloor sediment at the Peru margin (ODP Leg 201); clone:ODP1227A18.19 AB177012 
freshwater sulfurous lake; clone BIO AM076831 
Deep Subsurface Paleosol; clone Arc.1 19 AF005762 

subsurface water of Kalahari Shield clone N027FW1 00501 SAB1 81 DQ230936 
hydrothermal field of Southern Mariana Trough; clone FapmlaAOS AB213055 
Peru margin subsurface sediment; clone 42-AA7 AJ867793 
oceanic crust; clone FS142-17A-02 AY704374 
subsurface sediment at the Peru margin (ODP Leg 201); clone: ODP1227A1.01; AB176996 
oopsubsurface sediment at the Peru margin (ODP Leg 201); clone: ODP1227A3.14; AB177027 i 
Methane hydrate bearing sediments; clone:pMLA-7 AB1 09884 
Peru shelf subsurface sediment clone 1 H5_A01 DQ301976 
deep subsurface sediments from the Nankai Trough; clone NANK-A126 AY436516 

subseafloor sediments from the sea of Okhotsk; clone 0HKA1 .1 AB094513 
subsurface sediment at the Peru margin (ODP Leg 201); clone: ODP1230A13.29 AB177095 
IVIangrove sediment; cosmid done 26B6 
Subsurface sediment at the Peru margin (ODP Leg 201); clone: ODP1227A19.03 AB177017 
Estuary sediment; fosmid done 75G8 
mangrove soil; clone MKCSM-A1 1 DQ363772 
Hydrothermal fields of the Southern Mariana Trough; clone: Fapm1aA08 AB213056 

-^ 00 1 Fosmid done E6-3G HQ21 461 0 

■SCGC AB-539-E09, MCGE09 ALXK01 000069.1 * 
-Southern Okinawa trough; clone: p763_a_4.36 AB301992 



(class 6) 



MCG-A 



MCG-B 



fl— oc 

I — subsu 




MCG-C 



MCG-G 



100 ^ 



- Rice paddy soil; clone:NRP-N AB243805 



-Rainbow hydrothermal vent sediments; clone plR3AC09 AY354121 

Hydrothermal fields of the Southern Mariana Trough; clone: FnvA58 AB213071 



73 

sip 

^ I Natural 



MCG-E 



Deep Subsurface Paleosol; clone Arc.2 AF005753 

mangrove soil; clone MKCST-H2 DQ363826 

mangrove soil; clone MKCST-H10 DQ363825 
Natural gas field; clone MOB4-15 DQ841221 
Geothermal well; clone VulcPlw.156 DQ300326 

Mangrove soil; clone MKCST-A2 DQ363798 

Rainbow hydrothermal vent sediments; clone plR3AB04 AY354118 
natural gas field; clone MOB4-12 DQ841218 
Fosmid done E37-7F HQ21461 1 

mangrove soil; clone MKCSM-A9 DQ363776 

Nitrosopumilus maritimus SCM1 DQ085097 
Antarctica sponge; clone 2 AY320199 



591 

d~l 

I— natu 



Cenarchaeum symbiosum AF083072 
Amsterdam mud volcano; clone Amsterdam-1 A-14 AY592243 
Ukraine:Black Sea; clone BS-K-D4 AJ578124 
Methane hydrate sediments; clone ODP1251A1.1 AB177259 
■mangrove soil; clone MKCSB-D5 DQ363763 
marine deep-subsurface sediments; clone MA-A1-3 AY093448 

deep-sea hydrothermal vent; clone pMC2A308 AB019721 

Methanococcus vannielii M36507 

Thermoplasma acidophilum M38637 




MCG-D 



MCG-F 



MG-1 



MBGB 



0.1 

Figure 1 The phylogenetic tree of uncultivated MCG discussed in the text. The tree was constructed from the alignment of >900 
unambiguously aligned base pairs using MAFFT followed by Maximum likelihood method by RAxML with the GTRGAMMA model. The 
stability of the topology was evaluated by bootstrapping (100 replicates). The resulting bootstrap values are indicated at each node in the 
tree. The names of MCG groups (MCG-A to -G, and class 1-17) were modified based on Jiang et al.'s classification (Jiang et al., 2011) and 
Kubo et al.'s classification (Kubo et al., 2012), respectively. 
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subgroup. Whereas according to Kubo et al. (2012) 
classification, 37F10 belongs to class 6, 75G8 and 
26B6 belong to class 8 (Figure 1). 



Gene composition and comparative analyses of MCG 
genomic fragments 

Clone 75G8 had a 33887 bp insert size, which 
contained 32 predicted conserved domain 
sequences (CDS) and one 16S-23S rRNA operon 
(Figure 2, Supplementary Table Si). The G + C 
content of the whole insert was 56.94% and that 
of 16S rRNA was 59.80%. Twenty-five of the 
predicted protein-encoding sequences (CDS) could 
be assigned with functions, four were identified 
as hypothetical conserved proteins and three 
of the CDSs did not show significant similarity 
to any amino-acid sequences in the protein 
databases. 

Clone 26B6 had an insert size of 34887 bp and 
contained a dispersed 16S rRNA gene and 40 CDSs 
(Figure 2, Supplementary Table S2). The average 
G + C content of the insert sequence was 44.71% 
and that of 16S rRNA gene was 59.15%. Neither 23S 
nor 5S genes were found in this fragment. Most of 
the known archaea had one or a few copies of an 
rRNA operon containing at least both 16S and 23S 
rRNA genes, but the dispersed localization of the 
16S and 23S rRNAs was common within MCG and 



other archaeal members (Meng et al., 2009; Li et al., 
2012a). Twenty-two of the predicted CDSs could be 
assigned with functions, 10 were identified as 
hypothetical conserved proteins and 9 CDSs did 
not show significant similarity to any amino-acid 
sequences in the protein databases (Supplementary 
Table Si). 

The main characteristics of six available MCG 
fosmid/cosmid clones were listed in Table 1. The 
G + C contents of listed MCG 16S rRNA genes were 
relatively stable, ranging from 55.9 to 59.8%. But the 
G + C contents of the whole clones exhibited greater 
differences to as high as 19.4%, with the lowest 
G + C content of 37.5% for E37-7F and the highest of 
56.9% for 75G8. The similarities between 16S rRNA 
genes from metagenome clones and single cell clone 
ranged from 80 to 95% (Table 2), with an average 
similarity of 85%. The gene organizations on these 
six fosmid/cosmid MCG fragments were compared 
(as shown in Figure 2), big variations were demon- 
strated. Even focusing on the gene organizations 
around the 16S rRNA gene from all the six MCG 
metagenome clones, no synteny was found 
(Figure 2). This result was consistent with previous 
report that no colinear regions were found between 
MCG fosmids and any reported archaeal genomic 
fragments or genomes (Li et al., 2012a). 

The physiological traits and ecological 
significance of MCG archaea remain unclear. 



16S rRNA 



37F10 



E48_1C 




16S rRNA 

" 16S rRNA 

16S rRNA 

26B6 -^^^^^l5M^-HcjC^^ 



E37_7F 




[C] Energy production and conversion 

[E] Amino acid transport and metabolism 

[F] Nucleotide transport and metabolism 

[G] Carbohydrate transport and metabolism 

[H] Coenzyme transport and metabolism 

[I] Lipid transport and metabolism 

[J] Translation, ribosomal structure and biogenesis 

[K] Transcription 

rRNA 

NA 



[O] Post translational modification, protein turnover, chaperones 

[P] Inorganic ion transport and metabolism 

[L] Replication, recombination and repair 

[Q] Secondary metabolites biosynthesis, transport and catabolism 

[R] General function prediction only 

[M] Cellwall/membrane/envelope biogenesis 

[S] Function unknown 

[T] Signal transduction mechanisms 

tRNA 

[V] Defense mechanisms 



5kb 



Figure 2 Comparison of gene organization. The gene organizations of the genomic fragment from six MCG fosmid/cosmid clones were 
compared with each other. The genes are colored according to Clusters of Orthologous Groups (COG) category, and 16S rRNAs are linked 
in gray. 
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Table 1 Characteristic summary of MCG fosmid/cosmid clones 

37F10 75G8 26B6 E6-3G E37-7F E48-1C 



Insert length (bp) 34 528 33 887 

G + C content (%) 52.5 56.9 

G + C content of 16S rRNA (%) 58.0 59.8 

rrn operon and tRNA 16S 16S-23S 

tRNA"*^ 

No. of predicted GRFs 36 32 

No. of conserved hypothetic protein 4 4 

No. of hypothetical protein 7 3 

Average ORF length 807 756 

16S rRNA identities to 37F10 (%) 100 87 



34 877 38 227 42 618 34 738 

44.7 50.1 37.5 44.9 

59.1 57.6 55.9 58.6 

16S tRNA^i% 16S-5.8S-23S 16S-5.8S-23S 16S tRNA^' 

tRNA^^-p tRNA^'-g, tRNA^>'^ 

40 30 41 37 

10 3 4 4 

9 12 8 13 

747 885 696 706 

86 82 82 89 



Abbreviations: ORF, Open Reading Frame; MCG, Miscellaneous Crenarchaeota group. 



Previous studies have suggested that MCG were 
distributed in various habitats and exhibit extraordin- 
ary versatility. The 16S rRNA sequences of MCG 
members varied greatly, exhibiting as low as 76% 
similarity even within groups (Fry et al, 2008; Kubo 
et al, 2012). The comparison of retrieved MCG 
genomic fragments indicated huge variations in geno- 
mic regions other than the 16S rRNA gene sequences, 
and such high genomic diversity also supported the 
high metabolic diversity of MCGs, as suggested by their 
evolutionary diversity (Biddle et al, 2006; Fry et al, 
2008; Teske and S0rensen, 2008). 



MCG phylogenetic analysis based on LSU-SSU rRNA, 
ribosomal proteins and DNA TopoIB gene 
Currently, the MCG cluster exhibits no clear affilia- 
tion to any of the established archaeal phyla and 
presented an unstable branching order when 16S 
rRNA-based trees are constructed with different 
methods (Pester et al, 2011). LSU-SSU rRNA 
and/or concatenated ribosomal proteins have served 
as robust gene markers for phylogenetic analysis. 
MCG clone 75G8 contained a complete 16S-23S 
rRNA operon, and clone 26B6 contains several 
ribosomal protein genes, which gave us a chance 
to re-examine the phylogenetic relationship of MCG 
with other archaeal groups. 

In the LSU-SSU rRNA phylogenetic tree 
(Figure 3), MCG was clearly shown as a sister lineage 
of Aigarchaeota and Thaumarchaoeta. The novel 
archaeal phylum Aigarchaeota was just recently 
proposed by Nunoura et al. (2011) based on the 
distinct genomic features (that is, including genes 
encoding a ubiquitin-like protein modifier system) of 
Candidatus ' Caldiarchaeum subterraneum', which 
belongs to the HWCGI group. Our LSU-SSU rRNA 
phylogenetic analysis supported the independence 
of Aigarchaeota from Thaumarchaeota and further 
suggested that MCG could constitute a new phylum. 
Moreover, consistent with the LSU-SSU rRNA 
phylogenetic tree, the phylogenetic tree of concate- 
nated ribosomal proteins (Supplementary Figure Si) 
also indicated that MCG represents as a sister lineage 
with Thaumarchaeota. 

The genomic fragment of clone 26B6 contained a 
putative DNA TopoIB type protein (CDSl), which 



Table 2 Similarity of 16S rRNA genes between MCG fosmid/ 
cosmid and single-cell clones 
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Abbreviation: MCG, Miscellaneous Crenarchaeota group. 
""MCGEGQ is the single-cell sequencing clone (Lloyd et al., 2013), the 
partial 16S rRNA gene (900 bp) is used here for the pairwise 
comparison. 



showed highest similarity (40%) with that from 
Nitrosopumilus maritimus SCMl (Supplementary 
Table S2). Historically, type IB topoisomerases were 
thought to be eukaryotic-specific enzymes. A shorter 
version was then found in viruses and later on in 
several bacteria (Forterre et al., 2007), but these 
genes were not found in any archaea until recently 
in members of the proposed novel archaeal phylum 
Thaumarchaeota and Aigarchaeota (Brochier- 
Armanet et al., 2008; Nunoura et al., 2011). ToPoIB 
from MCG clone 26B6 formed a sister group with 
those from Thaumarchaeota and Aigarchaeota, 
forming an archaeal branch independent of those 
from Eukarya and virus (Figure 4). The archaeal- 
topoisomerase does not seem to be acquired via 
lateral gene transfer from Eukarya, but instead might 
have been present in the last ancestral common 
ancestor. 

According to the phylogenetic analyzes of 
LSU-SSU rRNA, ribosomal proteins and ToPoIB 
gene, MCG is clearly shown as a sister 
lineage with Thaumarchaeota and Aigarchaeota, 
and it is likely that they evolved from a common 
ancestor. In a recently published partial genome 
(with 30% genome recovery) obtained from a 
single cell of a MCG member (MCGE09, as 
shown in Figure 1), initial phylogenetic analyses 
using single copy genes in archaea also placed 
MCG as a sister lineage with Thaumarchaeota 
and Aigarchaeota (Lloyd et al., 2013). All these 
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Figure 3 The maximum likelihood tree based on the LSU-SSU sequences from archaea and bacteria. All sequences were retrieved 
from whole genomes or from environmental genomic fragments that contain LSU-SSU operon. In the tree, 75G8 indicates the 
genomic fragment obtained in this study. The sequences of LSU-SSU operon were aligned using MAFFT with L-lNS-i strategy. The 
maximum likelihood tree was computed by RAxML program using the general time-reversible (GTR) model of sequence evolution, 
by including a gamma-correction. The numbers at the nodes represent the non-parametric bootstrap values that were computed by 
RAxML. 



evidences indicate that MCG is not Crenarchaeota, 
and it locates at a deep branching position with 
Thaumarchaeota and Aigarchaeaota. Therefore, 
MCG is likely to be considered as a novel archaeal 
phylum, and we propose to name the new phylum 



'Bathyarchaeota' (from the Greek 'bathys', 
meaning deep as it locates deep branching 
with Thaumarchaeota and Aigarchaeaota, and 
frequently detected in the deep subsurface 
sediments). More precise phylogenetic placement 
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Figure 4 Unrooted maximum likelihood phylogenetic tree of TopoIB. TopoIB sequences from Thaumarchaeota, Virus and Euryotes. The 
numbers at the branches represent the bootstrap proportions. The scale bar represents the average number of substitutions per site. 



of MCG requires isolates and more genomes of 
MCG members in the future. 



Genes for aromatic compound degradation and 
expression verification 

Within the genome fragment of 75G8, one CDS, 
CDS21, shared the highest identity (43% protein 
identity, e- value = 4e-90) with a methyl-accepting 
chemotaxis protein (MCPs) from Nitrobacter wino- 
gradskyi Nb-255 (Supplementary Table Si). MCPs 
are a family of receptors that mediate chemotaxis 
toward diverse signals, allowing organisms to 
respond to changes in the concentration of attrac- 
tants and repellents in the environment by altering 
swimming behavior (Szurmant and Grdal, 2004). 
Two continuous CDSs (CDS16 and CDS17), locating 
upstream of MCPs protein in the MCG clone 75G8, 
were identified as putative 4-carboxymuconolactone 
decarboxylases (CMD) as matched to protein family 
HMM PF02627 (Supplementary Table Si). CMD 
catalyze the third step in the catabolism of proto- 
catechuate (and therefore the fourth step in the 
catabolism of para-hydroxybenzoate, of 3-hydroxy- 
benzoate, of vanillate and other compounds). 



CMDs catalyze the decarboxylation of carboxymu- 
conolactone, yielding P-ketoadipate enol-lactone, in 
the catabolism of aromatic compounds through the 
protocatechuate branch of the P-ketoadipate path- 
way (Stanier and Ornston, 1973). 

On fosmid clone 75G8, putative CMD genes 
involved in protocatechuate catabolism locates at 
the upstream of the MCP gene. These putative 
MCG-CMD proteins show highest sequence 
identity to putative CMDs from bacterial strains 
(Supplementary Table Si and Supplementary Figure 
S2). The CMD genes are widely distributed in 
various bacteria but rarely found in the available 
archaeal genomes (the presence of the CMD gene has 
only been observed in Sulfolobus and Methanomi- 
crobia so far). The MCG-CMD seems to have a 
bacterial origin (Supplementary Figure S2). As there 
is still very limited information available, no 
confirmed conclusion could be obtained. Never- 
theless, the presence of genes for both CMD and 
MCP in a genome fragment strongly suggests that 
MCG members (here, the MCG-G subcluster) may 
have the ability to utilize aromatic compounds. To 
test this hypothesis, we performed a substrate 
feeding cultivation experiment in which the source 
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75G8 CDS17 
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Figure 5 (a) A photo of the syringe filled with sediment and protocatechuate after 45 days' culturing. The labels from top to bottom 
(LI, L2, L3 and L4) correspond to the colors of the sediment layers, (b) Reverse transcription PGR (RT-PCR) analysis of 75G8_CDS16 and 
75G8_CDS 17 from the RNA extracted from different layers of a culturing syringe (L1-L4), original sediment (Bl) and a control sample 
(B2). L1-L4 indicate the different layers of the syringe sediment. Bl represents the original sediment sample without any treatment and 
B2 represents the control sample that was cultured under the same conditions but without protocatechuate. P indicates the positive 
control that used 75G8 fosmid DNA as the PGR template, and N indicates the negative control that used water as template. 



sediment was supplemented with protocatechuate 
as substrate. Fresh sediment from the mangrove 
reservation district, the same location where the 
sediment collected for metagenome construction, 
was sampled by a syringe as shown in Figure 5a. 
Then, dissolved protocatechuate solution was 
injected into the syringe from the top hole and 
seeped into the syringe core (See Materials and 
Methods for details). The syringe was then sealed 
and incubated in a thermostatic room (26 °C) for 45 
days. The incubated syringe was cut into four 
portions according to the color stratification 
(Figure 5 a). Total RNA was extracted to examine the 
expression of MCG-CMD genes (CDS 16 and CDS 17). 
The expression of CDS 16 was clearly observed only 
in the protocatechuate-supplemented sediment 
layer Ll, whereas the expression of CDS 17 was 
observed in both Ll and L2. In contrast, no 
expression of either CDS 16 or CDS 17 was observed 
in the original sediment sample or the control 
sample without protocatechuate supplementation 
(Figure 5b and Materials and Methods). The PCR 
bands of these two MCG-CMD genes were recovered 
from the gel, cloned and sequenced, and these 
sequences are identical to the CMD sequences in 
75G8 genomic fragment. Therefore, the expression 
of CMD genes was stimulated by protocatechuate. 
The result of this preliminary substrate feeding 
experiment and gene expression analysis strongly 
supported our hypothesis that MCG archaea could 
utilize protocatechuate as a substrate. A previous 
study has considered MCG archaea to be hetero- 
trophic anaerobes on the basis of depleted level of 
the stable ^^C (-15 to -28%) in whole archaeal 



cells and intact archaeal membrane lipids (Biddle 
et al, 2006). Recently, one of the MCG group 
members was suggested to have the capability of 
protein degradation (Lloyd et al., 2013). Here, we 
identified another putative substrate: protocatechuate. 
It seemed to be very likely that MCG members 
within different MCG subgroups may have divergent 
substrate-utilizing capabilities, considering that 
MCG subgroups had extremely high genomic diver- 
sities as demonstrated above. Identifying other 
possible substrates in addition to protocatechuate 
by stable isotope probing and/or single-cell sequen- 
cing, represents a new exciting avenue of MCG 
research that may help elucidate the physio- 
logical properties of these organisms and facilitate 
isolation. 
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